GESIS Workshop: Introduction to Geospatial Techniques for Social Scientists in R
Stefan Jünger & Dennis Abel
2025-04-09
Now
Day
Time
Title
April 09
10:00-11:30
Introduction
April 09
11:30-11:45
Coffee Break
April 09
11:45-13:00
Data Formats
April 09
13:00-14:00
Lunch Break
April 09
14:00-15:30
Mapping I
April 09
15:30-15:45
Coffee Break
April 09
15:45-17:00
Spatial Wrangling
April 10
09:00-10:30
Mapping II
April 10
10:30-10:45
Coffee Break
April 10
10:45-12:00
Applied Spatial Linking
April 10
12:00-13:00
Lunch Break
April 10
13:00-14:30
Spatial Autocorrelation
April 10
14:30-14:45
Coffee Break
April 10
14:45-16:00
Spatial Econometrics & Outlook
Mini wrap-up
Thus far, we have learned about
Different data formats
How to load them
First steps in interacting with them
Creating maps with tmap
But the real magic in geospatial data lies in their flexibility.
Our plan for this afternoon
In this session, you will learn
How to wrangle the different geospatial data formats even further
How to converse from one format into the other
Because we want to
Link/join different datasets
Do some spatial analysis
1. Advanced Importing
Importing of non-spatial data
Say the data we want to use are not available as a shapefile. Point coordinates are often stored in table formats like .csv – like the location of charging stations for electric cars data in our ./data folder.
We see that besides our attributes (e.g., operator, power,…), the table contains the two variables “longitude” (X) and “latitude” (Y), our point coordinates. When using the command sf::st_as_sf(), it is easy to transform the table into a point layer.
# transform to spatial data frameecharging_sf <- sf::st_as_sf( echarging_df |># there were some missings in my data which is not allowed dplyr::filter(!is.na(longitude) &!is.na(latitude)), coords =c("longitude", "latitude") )# inspect dataclass(echarging_sf)
[1] "sf" "tbl_df" "tbl" "data.frame"
Final check
plot(echarging_sf)
Check the CRS!
Make sure to use the option crs = [EPSG_ID]. If not used, your CRS will not be defined, and you can’t perform further commands depending on the CRS. Here, I tried EPSG IO or http://projfinder.com/ to find out.
echarging_sf <- echarging_df |># there were some missings in my data which is not allowed dplyr::filter(!is.na(longitude) &!is.na(latitude)) |> sf::st_as_sf( coords =c("longitude", "latitude"),crs =4326 )plot(echarging_sf["operator"])
… and the other way round
Do you want to go back to handling a simple data frame? You can quickly achieve this by dropping the geometry column.
Say, we’re interested in the accessibility of public transport in Cologne.
Bus, tram, etc.
All platforms and vehicles should be wheel-chair accessible
We can gather this information using OpenStreetMap!
Accessing OSM data: the Overpass API
The Overpass API (formerly known as OSM Server Side Scripting, or OSM3S before 2011) is a read-only API that serves up custom selected parts of the OSM map data. It acts as a database over the web: the client sends a query to the API and returns the data set that corresponds to the query.
With just two ‘simple’ commands, we can retrieve geospatial data about Cologne’s Park & Ride parking spaces in R. Not too bad, right?
tm_shape(park_and_ride) +tm_dots()
Raster data access
It is not only vector data that can be processed through these mechanisms.
The idea is the same for raster data.
Accessing the information through URLs
Just the downloaded data formats differ
Example: downloading German weather data
The German Weather Service provides excellent weather data in its Climate Data Center (CDC): https://opendata.dwd.de/climate_environment/CDC/. Let’s download some summer temperature data using a custom function.
While much of our previous data were points, we only had a column for the German federal states as administrative information so far. We’re now moving “a layer down” and looking at Germany on a more fine-grained spatial level: the district. We repeat what you already did in the exercise 1_3_1_Basic Maps: joining data like simple spreadsheets.
# load district shapefilegerman_districts <- sf::read_sf("./data/VG250_KRS.shp")# load district attributesattributes_districts <- readr::read_csv2("./data/attributes_districts.csv") # join datagerman_districts_enhanced <- german_districts |> dplyr::left_join(attributes_districts, by ="AGS")
Spatial join
But what can we do if we do not have a matching identifier? For example, there are no matching administrative identifiers in the German district data and the e-charger data.
german_districts_enhanced
Simple feature collection with 431 features and 29 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 280371.1 ymin: 5235856 xmax: 921292.4 ymax: 6101487
Projected CRS: ETRS89 / UTM zone 32N
# A tibble: 431 × 30
ADE GF BSG ARS AGS SDV_ARS GEN BEZ IBZ BEM NBD SN_L
<int> <int> <int> <chr> <chr> <chr> <chr> <chr> <int> <chr> <chr> <chr>
1 4 4 1 01001 01001 0100100000… Flen… Krei… 40 -- ja 01
2 4 4 1 01002 01002 0100200000… Kiel Krei… 40 -- ja 01
3 4 4 1 01003 01003 0100300000… Lübe… Krei… 40 -- ja 01
4 4 4 1 01004 01004 0100400000… Neum… Krei… 40 -- ja 01
5 4 4 1 01051 01051 0105100440… Dith… Kreis 42 -- ja 01
6 4 4 1 01053 01053 0105301001… Herz… Kreis 42 -- ja 01
7 4 4 1 01054 01054 0105400560… Nord… Kreis 42 -- ja 01
8 4 4 1 01055 01055 0105500120… Osth… Kreis 42 -- ja 01
9 4 4 1 01056 01056 0105600390… Pinn… Kreis 42 -- ja 01
10 4 4 1 01057 01057 0105700570… Plön Kreis 42 -- ja 01
# ℹ 421 more rows
# ℹ 18 more variables: SN_R <chr>, SN_K <chr>, SN_V1 <chr>, SN_V2 <chr>,
# SN_G <chr>, FK_S3 <chr>, NUTS <chr>, ARS_0 <chr>, AGS_0 <chr>, WSK <date>,
# DEBKG_ID <chr>, geometry <MULTIPOLYGON [m]>, car_density <dbl>,
# ecar_share <chr>, publictransport_meandist <dbl>, population <dbl>,
# green_voteshare <dbl>, afd_voteshare <dbl>
One might be interested in only one specific area of Germany, like Cologne. To subset a sf object, you can often use your usual data wrangling workflow. In this case, I know the AGS ID, the only row I want to keep.
If you have no information about ids but only about the geolocation, you can use sf::st_touches() (or sf::st_within(), sf::st_intersect(), sf::st_crosses(), …) to identify, for example, all districts which share a border with Cologne.
We can use our raster data to subset our e-charging data. But first, we have to adjust the CRS again (use sf::st_crs(echarging_sf) to look up EPSG code):
As you know, we can subset vector data by simply filtering for specific attribute values. For example, to subset Cologne’s districts only by the one of Deutz, we can use the dplyr for sf data:
If we only want to add one attribute from a vector dataset Y to another vector dataset X, we can conduct a spatial join using sf::st_join() as before. There is nothing new to tell. In the raster data world, these operations are called raster extractions.
Extracting information from raster data
Raster data are helpful when we aim to
Apply calculations that are the same for all geometries in the dataset
Extract information from the raster fast and efficient
To extract the raster values at a specific point by location, we use the following:
terra::extract(immigrant_rate, echarging_sf_cologne, ID =FALSE)
immigrants_cologne
1 NA
2 NA
3 NA
4 NA
5 NA
6 NA
7 NA
8 NA
9 NA
10 NA
11 25.274725
12 25.274725
13 NA
14 NA
15 19.444444
16 NA
17 NA
18 NA
19 NA
20 NA
21 NA
22 NA
23 NA
24 10.000000
25 NA
26 NA
27 NA
28 NA
29 NA
30 5.389222
31 NA
32 NA
33 12.790698
34 8.737864
35 4.705882
36 NA
37 NA
38 NA
39 NA
40 NA
41 NA
42 NA
43 NA
44 NA
45 NA
46 NA
47 12.500000
48 NA
49 NA
50 29.729730
51 50.000000
52 NA
53 8.333333
54 8.333333
55 12.142857
56 NA
57 NA
58 NA
59 5.633803
60 17.647059
61 8.000000
62 NA
63 NA
64 10.526316
65 16.883117
66 21.739130
67 50.000000
68 NA
69 NA
70 NA
71 NA
72 NA
73 NA
74 NA
75 NA
76 NA
77 NA
78 NA
79 NA
80 NA
81 NA
82 NA
83 NA
84 NA
85 NA
86 NA
87 NA
88 9.677419
89 3.809524
90 NA
91 NA
92 NA
93 NA
94 NA
95 NA
96 NA
97 NA
98 NA
99 NA
100 NA
101 15.000000
102 7.894737
103 NA
104 18.000000
105 NA
106 NA
107 11.904762
108 16.666667
109 13.636364
110 NA
111 NA
112 NA
113 17.142857
114 6.818182
115 NA
116 NA
117 NA
118 17.142857
119 10.526316
120 NA
121 NA
122 33.043478
123 NA
124 13.888889
125 NA
126 NA
127 30.136986
128 NA
129 NA
130 26.086957
131 7.407407
132 7.500000
133 NA
134 14.285714
135 NA
136 75.000000
137 NA
138 15.789474
139 7.692308
140 10.828025
141 NA
142 NA
143 NA
144 NA
145 NA
146 NA
147 NA
148 NA
149 NA
150 NA
151 NA
152 NA
153 NA
154 NA
155 NA
156 NA
157 NA
158 NA
159 NA
160 NA
161 9.090909
162 NA
163 NA
164 NA
165 NA
166 NA
167 NA
168 NA
169 NA
170 29.729730
171 NA
172 NA
173 NA
174 NA
175 NA
176 17.857143
177 18.309859
178 18.309859
179 18.309859
180 18.309859
181 18.309859
182 18.309859
183 13.043478
184 13.043478
185 13.043478
186 13.043478
187 13.043478
188 13.043478
189 13.043478
190 13.043478
191 13.043478
192 6.629834
193 23.076923
194 18.181818
195 23.076923
196 23.076923
197 17.582418
198 7.951807
199 NA
200 16.858238
201 18.614719
202 16.814159
203 17.391304
204 21.904762
205 18.652850
206 9.375000
207 9.375000
208 NA
209 NA
210 NA
211 NA
212 NA
213 NA
214 NA
215 NA
216 NA
217 NA
218 NA
219 NA
220 NA
221 NA
222 NA
223 NA
224 NA
225 NA
226 NA
227 NA
228 NA
229 NA
230 NA
231 NA
232 NA
233 NA
234 NA
235 NA
236 10.582011
237 18.072289
238 8.679245
239 27.083333
240 11.976048
241 18.500000
242 13.281250
243 13.809524
244 7.594937
245 6.611570
246 12.195122
247 19.780220
248 NA
249 12.871287
250 15.384615
251 16.438356
252 22.857143
253 4.347826
254 NA
255 11.940299
256 NA
257 51.063830
258 20.720721
259 10.924370
260 NA
261 NA
262 NA
263 NA
264 NA
265 NA
266 NA
267 NA
268 NA
269 NA
270 NA
271 NA
272 NA
273 NA
274 NA
275 NA
276 NA
277 NA
278 NA
279 NA
280 NA
281 NA
282 NA
283 NA
284 NA
285 NA
286 NA
287 20.600858
288 15.636364
289 NA
290 NA
291 25.373134
292 11.956522
293 NA
294 NA
295 NA
296 25.773196
297 14.942529
298 14.534884
299 9.090909
300 30.400000
301 26.737968
302 15.853659
303 NA
304 20.000000
305 29.166667
306 NA
307 11.428571
308 27.192982
309 NA
310 NA
311 NA
312 19.473684
313 3.846154
314 9.090909
315 48.275862
316 14.110429
317 NA
318 14.210526
319 NA
320 26.315789
321 18.750000
322 6.000000
323 NA
324 6.382979
325 6.382979
326 8.602151
327 8.602151
328 NA
329 NA
330 33.834586
331 NA
332 26.785714
333 5.660377
334 29.411765
335 NA
336 NA
337 22.142857
338 NA
339 NA
340 26.885880
341 NA
342 30.303030
343 NA
344 NA
345 NA
346 13.043478
347 8.333333
348 8.333333
349 17.391304
350 7.878788
351 NA
352 NA
353 NA
354 5.714286
355 28.828829
356 NA
357 NA
358 NA
359 12.903226
360 12.000000
361 NA
362 NA
363 NA
364 NA
365 NA
366 NA
367 NA
368 NA
369 NA
370 NA
371 NA
372 NA
373 NA
374 26.315789
375 10.900474
376 28.571429
377 27.513228
378 27.513228
379 8.677686
380 24.686192
381 NA
382 NA
383 NA
384 NA
385 NA
386 NA
387 NA
388 NA
389 NA
390 NA
391 NA
392 NA
393 NA
394 NA
395 NA
396 NA
397 NA
398 NA
399 NA
400 NA
401 NA
402 NA
403 NA
404 NA
405 20.574163
406 2.684564
407 19.469027
408 10.077519
409 NA
410 NA
411 12.418301
412 23.529412
413 NA
414 15.000000
415 9.000000
416 8.943089
417 25.581395
418 25.581395
419 32.142857
420 NA
421 NA
422 NA
423 NA
424 NA
425 NA
426 NA
427 NA
428 NA
429 16.603774
430 NA
431 NA
432 NA
433 NA
434 NA
435 NA
436 NA
437 NA
438 NA
439 NA
440 NA
441 NA
442 NA
443 NA
444 NA
445 NA
446 NA
447 NA
448 NA
449 NA
450 NA
451 NA
452 NA
453 NA
454 NA
455 NA
456 NA
457 NA
458 NA
459 NA
460 NA
461 NA
462 NA
463 NA
464 NA
465 NA
466 NA
467 NA
468 NA
469 NA
470 NA
471 NA
472 NA
473 NA
474 NA
475 NA
476 NA
477 NA
478 NA
479 NA
480 NA
481 NA
482 NA
483 NA
484 NA
485 NA
486 NA
487 NA
488 NA
489 NA
490 NA
491 NA
492 NA
493 NA
494 NA
495 NA
496 NA
497 NA
498 NA
499 NA
500 NA
501 NA
502 NA
503 NA
504 NA
505 NA
506 NA
507 NA
508 NA
509 NA
510 NA
511 NA
512 NA
513 NA
514 NA
515 NA
516 NA
517 NA
518 NA
519 NA
520 NA
521 NA
522 NA
523 NA
524 NA
525 NA
526 NA
527 NA
528 NA
529 NA
530 NA
531 NA
532 NA
533 NA
534 8.823529
535 NA
536 NA
537 NA
538 NA
539 NA
540 NA
541 NA
542 NA
543 NA
544 NA
545 NA
546 NA
547 6.521739
548 NA
549 16.279070
550 10.169492
551 NA
552 NA
553 NA
554 NA
555 57.142857
556 NA
557 NA
558 NA
559 NA
560 NA
561 18.320611
562 NA
563 9.883721
564 NA
565 6.382979
566 NA
567 NA
568 NA
569 NA
570 NA
571 NA
572 NA
573 NA
574 NA
575 NA
576 NA
577 8.695652
578 60.000000
579 15.079365
580 NA
581 7.272727
582 6.818182
583 4.761905
584 3.333333
585 6.481481
586 6.285714
587 21.666667
588 10.416667
589 7.377049
590 10.526316
591 NA
592 NA
593 NA
594 21.276596
595 5.042017
596 10.650888
597 10.169492
598 NA
599 3.703704
600 43.767313
601 NA
602 NA
603 NA
604 NA
605 NA
606 NA
607 NA
608 8.000000
609 20.833333
610 5.555556
611 20.714286
612 12.121212
613 11.688312
614 23.809524
615 13.043478
616 23.004695
617 NA
618 NA
619 NA
620 24.324324
621 4.278075
622 15.131579
623 12.658228
624 24.742268
625 45.833333
626 NA
627 9.779180
628 11.666667
629 NA
630 NA
631 21.428571
632 21.428571
633 21.428571
634 21.428571
635 6.097561
636 26.595745
637 NA
638 12.121212
639 NA
640 NA
641 NA
642 26.114650
643 33.014354
644 54.237288
645 54.237288
646 19.444444
647 23.809524
648 50.000000
649 35.256410
650 NA
651 41.007194
652 11.267606
653 23.809524
654 NA
655 16.666667
656 NA
657 21.717172
658 20.670391
659 21.243523
660 NA
661 13.402062
662 28.125000
663 6.666667
664 6.666667
665 NA
666 20.661157
667 4.166667
668 23.076923
669 20.353982
670 16.000000
671 10.000000
672 11.607143
673 30.909091
674 NA
675 NA
676 NA
677 NA
678 NA
679 NA
680 NA
681 NA
682 NA
683 NA
684 NA
685 NA
686 NA
687 NA
688 NA
689 NA
690 NA
691 30.769231
692 20.430108
693 10.526316
694 46.590909
695 15.463918
696 NA
697 18.390805
698 5.217391
699 NA
700 5.555556
701 2.678571
702 37.404580
703 NA
704 26.395939
705 NA
706 5.555556
707 6.818182
708 4.347826
709 4.347826
710 33.043478
711 43.859649
712 33.039648
713 2.803738
714 7.500000
715 11.904762
716 NA
717 NA
718 32.467532
719 7.633588
720 13.861386
721 14.285714
722 NA
723 22.222222
724 12.643678
725 NA
726 NA
727 NA
728 4.761905
729 53.846154
730 NA
731 NA
732 NA
733 42.424242
734 11.111111
735 4.477612
736 6.250000
737 NA
738 NA
739 NA
740 42.857143
741 NA
742 NA
743 NA
744 NA
745 NA
746 NA
747 NA
748 NA
749 NA
750 NA
751 16.000000
752 NA
753 NA
754 NA
755 NA
756 NA
757 NA
758 NA
759 NA
760 NA
761 NA
762 NA
763 NA
764 NA
765 NA
766 NA
767 NA
768 NA
769 24.844720
770 NA
771 20.500000
772 NA
773 NA
774 31.034483
775 NA
776 NA
777 NA
778 NA
779 NA
780 NA
781 NA
782 NA
783 NA
784 NA
785 NA
786 NA
787 NA
788 NA
789 NA
790 18.750000
791 NA
792 NA
793 18.750000
794 18.750000
795 NA
796 NA
797 60.000000
798 NA
799 NA
800 NA
801 NA
802 NA
803 NA
804 NA
805 NA
806 NA
807 NA
808 NA
809 NA
810 NA
811 NA
812 NA
813 NA
814 NA
815 NA
816 25.925926
817 NA
818 7.563025
819 NA
820 61.111111
821 NA
822 NA
823 8.000000
824 NA
825 43.023256
826 NA
827 NA
828 20.289855
829 21.739130
830 NA
831 NA
832 NA
833 62.500000
834 NA
835 NA
836 NA
837 NA
838 NA
839 28.205128
840 NA
841 17.307692
842 12.837838
843 15.625000
844 27.741935
845 15.596330
846 42.000000
847 13.043478
848 48.648649
849 NA
850 20.833333
851 20.833333
852 7.936508
853 NA
854 NA
855 4.838710
856 4.838710
857 NA
858 NA
859 NA
860 NA
861 NA
862 NA
863 NA
864 NA
865 29.032258
866 NA
867 10.000000
868 NA
869 NA
870 27.027027
871 NA
872 NA
873 6.976744
874 NA
875 NA
876 4.687500
Add results to existing dataset
This information can be added to an existing dataset (our points in this example):
echarging_sf_cologne$immigrant_rate_value <- terra::extract(immigrant_rate, echarging_sf_cologne, ID =FALSE) |>unlist()echarging_sf_cologne
Simple feature collection with 876 features and 6 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 343957.8 ymin: 5632513 xmax: 370664.4 ymax: 5661644
Projected CRS: ETRS89 / UTM zone 32N
# A tibble: 876 × 7
operator federal_state power_kw type num_plugs geometry
* <chr> <chr> <dbl> <chr> <dbl> <POINT [m]>
1 Cut! Energy… Nordrhein-We… 44 Norm… 2 (358799.3 5659476)
2 Kaufland Nordrhein-We… 93 Schn… 2 (352129.2 5661644)
3 Kaufland Nordrhein-We… 72 Schn… 3 (352135.4 5661628)
4 MEGA Kunden… Nordrhein-We… 44 Norm… 2 (352102.1 5661543)
5 MEGA Kunden… Nordrhein-We… 44 Norm… 2 (352102.1 5661543)
6 MEGA Kunden… Nordrhein-We… 44 Norm… 2 (352102.1 5661543)
7 MEGA Kunden… Nordrhein-We… 44 Norm… 2 (352102.1 5661543)
8 MEGA Kunden… Nordrhein-We… 44 Norm… 2 (352102.1 5661543)
9 Auto Thomas… Nordrhein-We… 39.6 Norm… 2 (352737.2 5633750)
10 EWE Go GmbH Nordrhein-We… 160 Schn… 2 (353129 5634718)
# ℹ 866 more rows
# ℹ 1 more variable: immigrant_rate_value <dbl>
More elaborated: spatial buffers
Sometimes, extracting information 1:1 is not enough.
It’s too narrow
There is missing information about the surroundings of a point
We can use the same procedure to aggregate a raster dataset into a vector polygon dataset. That’s a widespread use case. Let’s reload our Cologne shapefile.
Points of interest data are nice for analyzing urban infrastructure. Let’s draw a quick ‘heatmap’ using observation densities.
echarging_sf_cologne_raster <- terra::rast( echarging_sf_cologne, res =1000 )echarging_sf_cologne_densities <- echarging_sf_cologne |> terra::rasterize( echarging_sf_cologne_raster, fun = length, background =0 )plot(echarging_sf_cologne_densities)
Backup / For Home Use
Focal statistics / spatial filter
Focal statistics are another method of including information near a point in space. However, it’s applied to the whole dataset and is independent of arbitrary points we project onto a map.
Relates focal cells to surrounding cells
Vastly used in image processing
But also applicable in social science research, as we will see